[ENH] Add participant+sessions.tsv for session-varying participant metadata#2403
[ENH] Add participant+sessions.tsv for session-varying participant metadata#2403yarikoptic wants to merge 1 commit into
Conversation
…tadata Introduces a single new optional dataset-level file `participant+sessions.tsv` with a composite index `[participant_id, session_id]`. This provides a single top-level location for metadata that varies across both participants and sessions -- e.g. age at each visit, body weight, clinical scores in longitudinal studies -- complementing the existing `participants.tsv` (participant-constant) and per-subject `*_sessions.tsv` files. Note that it is already possible to provide such metadata in `sub-*/ses-*_sessions.tsv` file. So such approach just serves the way to provide an "aggregate" collection of metadata. As such, we might then need to define how it interacts with the inheritance principle, but defining that yet TODO in general for .tsv files. The `+` in the filename signals a composite index, following the convention proposed in #2273 and alternative to freshly proposed #2402 inspired by a work on BEP036 - #2123 hence attn @bids-standard/bep036 . Most of the changes are just straightforward interpolation of `participants.tsv` and `sessions.tsv` files definitions. One of the notable changes is to `meta/context.yaml` where we added `dataset.sessions` (union of all session directories across subjects) to enable session-level validation checks. I think it is only reasonable given that we did already included dataset level summaries for datatypes and modalities. But it would require bids-validator to support it. Alternative - is to drop it and that extra check we added. Ideally though we should figure out how to validate specific combinations of sub/sessions and TODO was left for that. An example `participant+sessions.tsv` with `body_weight` column for the already `7t_trt` bids-examples dataset is at - bids-standard/bids-examples#556 where, if you also look into original `participants.tsv`, makes it a little obvious that duplication of all entries across all sessions would be dubious. - implements a single first manifestation for #2273 - I think overall we can state that it closes #1020 which theoretically could have been closed with original introduction of _sessions.tsv files. Co-Authored-By: Claude Code 2.1.113 / Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2403 +/- ##
=======================================
Coverage 83.07% 83.07%
=======================================
Files 22 22
Lines 1696 1696
=======================================
Hits 1409 1409
Misses 287 287 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Is the proposed Or is |
|
Thank you @dmoracze for your question! TL;DR: he full and exhaustive answer is "NO" and I added TODO to the original description to elaborate on that! The quick extended answer was alluded to in the original description:
So we already have that mechanism and it could be used in cases where more appropriate. And, although we are yet to actually improve/formalize inheritance principle for the application to .tsv files, and to this composite indexing in particular , but potentially such metadata could then be present in both top level |
|
Ah ha, yes that makes sense. I'm both trying to mesh this enhancement with our group's typical usecases and brainstorming a way forward for #2123. We often curate datasets with multiple visits and data types and many times each visit does not contain all data types. Take, for example, two visits where only survey/demographic data is collected, say pre/post drug administration where MR scan sessions take place between the survey sessions. The current spec recommends aggregating these data into
Perhaps this is an edge case in the field as a whole, but we curate many datasets like this. The other option is to allow some type of composite index in TSVs stored in That is my current understanding of the situation, let me know if I'm misunderstanding your proposal. |
|
There is always a compromise to hit. That is why there is flexibility + "redundancy" (eg. via IP + summarization) available in BIDS to cater a wider range of use cases -- one size fits all might not work. in "unwieldy", in particular for
thanks for bringing it up too! I recall having some discussions or "brainstorming" toward generalization and allowing for "symmetry" of datatype folders (could also apply to |
|
re unwildly again, since I am on a mission (I feel like) of promotion -- have you seen/tried https://www.visidata.org ? |
|
since I do not think it is something which should be merged without more of extended discussion -- moved to draft "for protection" |
Introduces a single new optional dataset-level file
participant+sessions.tsvwith a composite index[participant_id, session_id]. This provides a single top-level location for metadata that varies across both participants and sessions -- e.g. age at each visit, body weight, clinical scores in longitudinal studies -- complementing the existingparticipants.tsv(participant-constant) and per-subject*_sessions.tsvfiles.Note that it is already possible to provide such metadata in
sub-*/ses-*_sessions.tsvfile. So such approach just serves the way to provide an "aggregate" collection of metadata. As such, we might then need to define how it interacts with the inheritance principle, but defining that yet TODO in general for .tsv files.The
+in the filename signals a composite index, following the convention proposed ininspired by a work on BEP036 (@bids-standard/bep036):
Most of the changes are just straightforward interpolation of
participants.tsvandsessions.tsvfiles definitions.One of the notable changes is to
meta/context.yamlwhere we addeddataset.sessions(union of all session directories across subjects) to enable session-level validation checks. I think it is only reasonable given that we did already included dataset level summaries for datatypes and modalities. But it would require bids-validator to support it. Alternative - is to drop it and that extra check we added.Ideally though we should figure out how to validate specific combinations of sub/sessions and TODO was left for that.
An example
participant+sessions.tsvwithbody_weightcolumn for the already7t_trtbids-examples dataset is atwhere, if you also look into original
participants.tsv, makes it a little obvious that duplication of all entries across all sessions would be dubious.TODOs
sub-*/sub-*_sessions.tsvfiles. Two possibilities: